PoClustering: Lossless Clustering of Dissimilarity Data
نویسندگان
چکیده
Given a set of objects V with a dissimilarity measure between pairs of objects in V , a PoCluster is a collection of sets P ⊂ powerset(V ) partially ordered by the ⊂ relation such that S ⊂ T iff the maximal dissimilarity among objects in S is less than the maximal dissimilarity among objects in T . PoClusters capture categorizations of objects that are not strictly hierarchical, such as those found in ontologies. PoClusters can not, in general, be constructed using hierarchical clustering algorithms. In this paper, we examine the relationship between PoClusters and dissimilarity matrices and prove that PoClusters are in one-to-one correspondence with the set of dissimilarity matrices. The PoClustering problem is NP-Complete, and we present a heuristic algorithm for it in this paper. Experiments on both synthetic and real datasets demonstrate the quality and scalability of the algorithms.
منابع مشابه
خوشهبندی دادههای بیانژنی توسط عدم تشابه جنگل تصادفی
Background: The clustering of gene expression data plays an important role in the diagnosis and treatment of cancer. These kinds of data are typically involve in a large number of variables (genes), in comparison with number of samples (patients). Many clustering methods have been built based on the dissimilarity among observations that are calculated by a distance function. As increa...
متن کاملComposite Kernel Optimization in Semi-Supervised Metric
Machine-learning solutions to classification, clustering and matching problems critically depend on the adopted metric, which in the past was selected heuristically. In the last decade, it has been demonstrated that an appropriate metric can be learnt from data, resulting in superior performance as compared with traditional metrics. This has recently stimulated a considerable interest in the to...
متن کاملClustering with Intelligent Linexk-Means
The intelligent LINEX k-means clustering is a generalization of the k-means clustering so that the number of clusters and their related centroid can be determined while the LINEX loss function is considered as the dissimilarity measure. Therefore, the selection of the centers in each cluster is not randomly. Choosing the LINEX dissimilarity measure helps the researcher to overestimate or undere...
متن کاملیادگیری نیمه نظارتی کرنل مرکب با استفاده از تکنیکهای یادگیری معیار فاصله
Distance metric has a key role in many machine learning and computer vision algorithms so that choosing an appropriate distance metric has a direct effect on the performance of such algorithms. Recently, distance metric learning using labeled data or other available supervisory information has become a very active research area in machine learning applications. Studies in this area have shown t...
متن کاملCommon Dissimilarity Measures are Inappropriate for Time Series Clustering
Clustering algorithms have been actively used to identify similar time series, providing a better understanding of data. However, common clustering dissimilarity measures disregard time series correlations, yielding poor results. In this paper, we introduce a dissimilarity measure based on series partial autocorrelations. Experiments compare hierarchical clustering algorithms using the common d...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007